Distance between two distributions

Total Variation Distance (TVD, or variation distance or statistical distance)

= the maximum absolute difference in probabilities assigned to any single event by the two distributions.

d_{TVD}(P, Q) = \frac{1}{2} \sum_x |P(x) - Q(x)|$$, where x ranges over all possible values in the sample space - range in [0, 1], where 1 = no events in common - symmetric - robust to outliers and can capture the largest discrepancy between discrete/continuous distributions ### Wasserstein Distance (Earth Mover's distance or Kantorovich-Rubinstein distance) = the minimum amount of work needed to transform one distribution into the other, where "work" is defined in terms of the cost of moving mass from one point to another. $$W_p(P, Q) = \left( \inf_{\gamma \in \Pi(P, Q)} \int_{\mathcal{X} \times \mathcal{X}} d(x, y)^p d\gamma(x, y) \right)^{\frac{1}{p}}

Hellinger Distance

= the similarity between two probability distributions by comparing the square root of the difference between the square roots of the probability densities.

H2(P,Q)=12x(P(x)Q(x))2

Kullback-Leibler Divergence (KL Divergence)

see Information theory and Entropy in Neuroscience#Kullback-Leibler Divergence

Jensen-Shannon Divergence

= a symmetrized version of KL divergence that measures the similarity between two distributions by averaging their KL divergences from the average of the two distributions.

DJS(P||Q)=12DKL(P||M)12DKL(Q||M)$$where$M=12(P+Q)$istheaverageofthetwodistributionsrangein[0,log(2)]symmetriccapturesboththedivergenceandconvergencebetweendistributions.